Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 2891 |
| Missing cells | 48 |
| Missing cells (%) | 0.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 293.7 KiB |
| Average record size in memory | 104.0 B |
Variable types
| Categorical | 4 |
|---|---|
| Numeric | 8 |
| DateTime | 1 |
updated_at has constant value "2022-12-10 21:24:34.727669" | Constant |
city has a high cardinality: 567 distinct values | High cardinality |
male_population is highly overall correlated with female_population and 4 other fields | High correlation |
female_population is highly overall correlated with male_population and 4 other fields | High correlation |
total_population is highly overall correlated with male_population and 4 other fields | High correlation |
number_of_veterans is highly overall correlated with state and 6 other fields | High correlation |
foreign_born is highly overall correlated with male_population and 4 other fields | High correlation |
state is highly overall correlated with median_age and 3 other fields | High correlation |
state_code is highly overall correlated with state and 3 other fields | High correlation |
count is highly overall correlated with male_population and 4 other fields | High correlation |
median_age is highly overall correlated with state and 1 other fields | High correlation |
average_household_size is highly overall correlated with state and 1 other fields | High correlation |
city is uniformly distributed | Uniform |
Reproduction
| Analysis started | 2022-12-10 13:30:55.321834 |
|---|---|
| Analysis finished | 2022-12-10 13:31:14.967163 |
| Duration | 19.65 seconds |
| Software version | pandas-profiling vv3.5.0 |
| Download configuration | config.json |
| Distinct | 567 |
|---|---|
| Distinct (%) | 19.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.7 KiB |
| Bloomington | 15 |
|---|---|
| Columbia | 15 |
| Springfield | 15 |
| Jackson | 10 |
| Norwalk | 10 |
| Other values (562) |
Length
| Max length | 47 |
|---|---|
| Median length | 25 |
| Mean length | 9.1030785 |
| Min length | 2 |
Characters and Unicode
| Total characters | 26317 |
|---|---|
| Distinct characters | 56 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
Unique
| Unique | 2 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Silver Spring |
|---|---|
| 2nd row | Quincy |
| 3rd row | Hoover |
| 4th row | Rancho Cucamonga |
| 5th row | Newark |
Common Values
| Value | Count | Frequency (%) |
| Bloomington | 15 | 0.5% |
| Columbia | 15 | 0.5% |
| Springfield | 15 | 0.5% |
| Jackson | 10 | 0.3% |
| Norwalk | 10 | 0.3% |
| Lakewood | 10 | 0.3% |
| Arlington | 10 | 0.3% |
| Fayetteville | 10 | 0.3% |
| Rochester | 10 | 0.3% |
| Albany | 10 | 0.3% |
| Other values (557) | 2776 |
Length
| Value | Count | Frequency (%) |
| city | 98 | 2.5% |
| san | 72 | 1.9% |
| beach | 51 | 1.3% |
| valley | 40 | 1.0% |
| saint | 40 | 1.0% |
| santa | 40 | 1.0% |
| new | 34 | 0.9% |
| fort | 29 | 0.8% |
| park | 25 | 0.7% |
| hills | 24 | 0.6% |
| Other values (598) | 3392 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 2515 | 9.6% |
| e | 2366 | 9.0% |
| n | 2076 | 7.9% |
| o | 2007 | 7.6% |
| r | 1595 | 6.1% |
| i | 1579 | 6.0% |
| l | 1569 | 6.0% |
| t | 1297 | 4.9% |
| s | 1045 | 4.0% |
| 959 | 3.6% | |
| Other values (46) | 9309 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 21452 | |
| Uppercase Letter | 3853 | 14.6% |
| Space Separator | 959 | 3.6% |
| Dash Punctuation | 28 | 0.1% |
| Other Punctuation | 25 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2515 | |
| e | 2366 | |
| n | 2076 | |
| o | 2007 | |
| r | 1595 | 7.4% |
| i | 1579 | 7.4% |
| l | 1569 | 7.3% |
| t | 1297 | 6.0% |
| s | 1045 | 4.9% |
| d | 697 | 3.2% |
| Other values (18) | 4706 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 475 | |
| S | 405 | 10.5% |
| B | 293 | 7.6% |
| P | 248 | 6.4% |
| L | 243 | 6.3% |
| M | 237 | 6.2% |
| R | 229 | 5.9% |
| A | 227 | 5.9% |
| F | 170 | 4.4% |
| W | 157 | 4.1% |
| Other values (14) | 1169 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 20 | |
| / | 5 | 20.0% |
Space Separator
| Value | Count | Frequency (%) |
| 959 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 28 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 25305 | |
| Common | 1012 | 3.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 2515 | 9.9% |
| e | 2366 | 9.3% |
| n | 2076 | 8.2% |
| o | 2007 | 7.9% |
| r | 1595 | 6.3% |
| i | 1579 | 6.2% |
| l | 1569 | 6.2% |
| t | 1297 | 5.1% |
| s | 1045 | 4.1% |
| d | 697 | 2.8% |
| Other values (42) | 8559 |
Common
| Value | Count | Frequency (%) |
| 959 | ||
| - | 28 | 2.8% |
| ' | 20 | 2.0% |
| / | 5 | 0.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 26314 | |
| None | 3 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 2515 | 9.6% |
| e | 2366 | 9.0% |
| n | 2076 | 7.9% |
| o | 2007 | 7.6% |
| r | 1595 | 6.1% |
| i | 1579 | 6.0% |
| l | 1569 | 6.0% |
| t | 1297 | 4.9% |
| s | 1045 | 4.0% |
| 959 | 3.6% | |
| Other values (44) | 9306 |
None
| Value | Count | Frequency (%) |
| ü | 2 | |
| ó | 1 |
state
Categorical
| Distinct | 49 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.7 KiB |
| California | |
|---|---|
| Texas | |
| Florida | |
| Illinois | 91 |
| Washington | 85 |
| Other values (44) |
Length
| Max length | 20 |
|---|---|
| Median length | 13 |
| Mean length | 8.4268419 |
| Min length | 4 |
Characters and Unicode
| Total characters | 24362 |
|---|---|
| Distinct characters | 46 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Maryland |
|---|---|
| 2nd row | Massachusetts |
| 3rd row | Alabama |
| 4th row | California |
| 5th row | New Jersey |
Common Values
| Value | Count | Frequency (%) |
| California | 676 | |
| Texas | 273 | 9.4% |
| Florida | 222 | 7.7% |
| Illinois | 91 | 3.1% |
| Washington | 85 | 2.9% |
| Arizona | 80 | 2.8% |
| Colorado | 80 | 2.8% |
| Michigan | 79 | 2.7% |
| North Carolina | 70 | 2.4% |
| Virginia | 70 | 2.4% |
| Other values (39) | 1165 |
Length
| Value | Count | Frequency (%) |
| california | 676 | |
| texas | 273 | 8.6% |
| florida | 222 | 7.0% |
| new | 141 | 4.4% |
| carolina | 94 | 2.9% |
| illinois | 91 | 2.9% |
| washington | 85 | 2.7% |
| arizona | 80 | 2.5% |
| colorado | 80 | 2.5% |
| north | 80 | 2.5% |
| Other values (44) | 1366 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 3650 | |
| i | 3033 | |
| o | 2207 | 9.1% |
| n | 2073 | 8.5% |
| r | 1675 | 6.9% |
| l | 1435 | 5.9% |
| s | 1390 | 5.7% |
| e | 1136 | 4.7% |
| C | 894 | 3.7% |
| f | 681 | 2.8% |
| Other values (36) | 6188 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 20882 | |
| Uppercase Letter | 3183 | 13.1% |
| Space Separator | 297 | 1.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 3650 | |
| i | 3033 | |
| o | 2207 | |
| n | 2073 | |
| r | 1675 | |
| l | 1435 | 6.9% |
| s | 1390 | 6.7% |
| e | 1136 | 5.4% |
| f | 681 | 3.3% |
| t | 580 | 2.8% |
| Other values (14) | 3022 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 894 | |
| M | 341 | 10.7% |
| T | 317 | 10.0% |
| N | 276 | 8.7% |
| F | 222 | 7.0% |
| I | 210 | 6.6% |
| A | 148 | 4.6% |
| W | 130 | 4.1% |
| O | 119 | 3.7% |
| V | 70 | 2.2% |
| Other values (11) | 456 |
Space Separator
| Value | Count | Frequency (%) |
| 297 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 24065 | |
| Common | 297 | 1.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 3650 | |
| i | 3033 | |
| o | 2207 | 9.2% |
| n | 2073 | 8.6% |
| r | 1675 | 7.0% |
| l | 1435 | 6.0% |
| s | 1390 | 5.8% |
| e | 1136 | 4.7% |
| C | 894 | 3.7% |
| f | 681 | 2.8% |
| Other values (35) | 5891 |
Common
| Value | Count | Frequency (%) |
| 297 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 24362 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 3650 | |
| i | 3033 | |
| o | 2207 | 9.1% |
| n | 2073 | 8.5% |
| r | 1675 | 6.9% |
| l | 1435 | 5.9% |
| s | 1390 | 5.7% |
| e | 1136 | 4.7% |
| C | 894 | 3.7% |
| f | 681 | 2.8% |
| Other values (36) | 6188 |
median_age
Real number (ℝ)
| Distinct | 180 |
|---|---|
| Distinct (%) | 6.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.494881 |
| Minimum | 22.9 |
|---|---|
| Maximum | 70.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.7 KiB |
Quantile statistics
| Minimum | 22.9 |
|---|---|
| 5-th percentile | 28.8 |
| Q1 | 32.8 |
| median | 35.3 |
| Q3 | 38 |
| 95-th percentile | 42.6 |
| Maximum | 70.5 |
| Range | 47.6 |
| Interquartile range (IQR) | 5.2 |
Descriptive statistics
| Standard deviation | 4.4016167 |
|---|---|
| Coefficient of variation (CV) | 0.12400709 |
| Kurtosis | 4.164544 |
| Mean | 35.494881 |
| Median Absolute Deviation (MAD) | 2.6 |
| Skewness | 0.64661844 |
| Sum | 102615.7 |
| Variance | 19.37423 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 35.7 | 50 | 1.7% |
| 33.4 | 48 | 1.7% |
| 33.1 | 45 | 1.6% |
| 36.8 | 45 | 1.6% |
| 34.1 | 45 | 1.6% |
| 34.5 | 44 | 1.5% |
| 38.1 | 42 | 1.5% |
| 34.6 | 40 | 1.4% |
| 35.3 | 40 | 1.4% |
| 36 | 40 | 1.4% |
| Other values (170) | 2452 |
| Value | Count | Frequency (%) |
| 22.9 | 5 | |
| 23 | 4 | 0.1% |
| 23.5 | 5 | |
| 23.6 | 5 | |
| 23.9 | 5 | |
| 24.2 | 5 | |
| 25.5 | 5 | |
| 26 | 5 | |
| 26.1 | 5 | |
| 26.2 | 10 |
| Value | Count | Frequency (%) |
| 70.5 | 3 | |
| 48.8 | 5 | |
| 47.9 | 4 | |
| 47.6 | 4 | |
| 47.4 | 5 | |
| 47.3 | 5 | |
| 47 | 5 | |
| 46.9 | 5 | |
| 46.8 | 4 | |
| 45.9 | 4 |
male_population
Real number (ℝ)
| Distinct | 593 |
|---|---|
| Distinct (%) | 20.5% |
| Missing | 3 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 97328.426 |
| Minimum | 29281 |
|---|---|
| Maximum | 4081698 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.7 KiB |
Quantile statistics
| Minimum | 29281 |
|---|---|
| 5-th percentile | 32290 |
| Q1 | 39289 |
| median | 52341 |
| Q3 | 86641.75 |
| 95-th percentile | 296902.6 |
| Maximum | 4081698 |
| Range | 4052417 |
| Interquartile range (IQR) | 47352.75 |
Descriptive statistics
| Standard deviation | 216299.94 |
|---|---|
| Coefficient of variation (CV) | 2.2223717 |
| Kurtosis | 209.81379 |
| Mean | 97328.426 |
| Median Absolute Deviation (MAD) | 15991 |
| Skewness | 12.735597 |
| Sum | 2.810845 × 108 |
| Variance | 4.6785663 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 40601 | 10 | 0.3% |
| 33993 | 10 | 0.3% |
| 135455 | 5 | 0.2% |
| 518317 | 5 | 0.2% |
| 149547 | 5 | 0.2% |
| 60989 | 5 | 0.2% |
| 60704 | 5 | 0.2% |
| 83640 | 5 | 0.2% |
| 55550 | 5 | 0.2% |
| 42100 | 5 | 0.2% |
| Other values (583) | 2828 |
| Value | Count | Frequency (%) |
| 29281 | 5 | |
| 29995 | 5 | |
| 30007 | 5 | |
| 30193 | 4 | |
| 30758 | 5 | |
| 30799 | 2 | 0.1% |
| 30844 | 5 | |
| 30890 | 5 | |
| 31019 | 5 | |
| 31205 | 5 |
| Value | Count | Frequency (%) |
| 4081698 | 5 | |
| 1958998 | 5 | |
| 1320015 | 5 | |
| 1149686 | 5 | |
| 786833 | 5 | |
| 741270 | 5 | |
| 721405 | 5 | |
| 693826 | 5 | |
| 639019 | 5 | |
| 518317 | 5 |
female_population
Real number (ℝ)
| Distinct | 594 |
|---|---|
| Distinct (%) | 20.6% |
| Missing | 3 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 101769.63 |
| Minimum | 27348 |
|---|---|
| Maximum | 4468707 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.7 KiB |
Quantile statistics
| Minimum | 27348 |
|---|---|
| 5-th percentile | 34163 |
| Q1 | 41227 |
| median | 53809 |
| Q3 | 89604 |
| 95-th percentile | 315853.35 |
| Maximum | 4468707 |
| Range | 4441359 |
| Interquartile range (IQR) | 48377 |
Descriptive statistics
| Standard deviation | 231564.57 |
|---|---|
| Coefficient of variation (CV) | 2.2753799 |
| Kurtosis | 227.63305 |
| Mean | 101769.63 |
| Median Absolute Deviation (MAD) | 15771 |
| Skewness | 13.320445 |
| Sum | 2.9391069 × 108 |
| Variance | 5.3622151 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 35801 | 10 | 0.3% |
| 41862 | 5 | 0.2% |
| 57422 | 5 | 0.2% |
| 508602 | 5 | 0.2% |
| 151293 | 5 | 0.2% |
| 61247 | 5 | 0.2% |
| 65417 | 5 | 0.2% |
| 92957 | 5 | 0.2% |
| 144323 | 5 | 0.2% |
| 43511 | 5 | 0.2% |
| Other values (584) | 2833 |
| Value | Count | Frequency (%) |
| 27348 | 5 | |
| 31238 | 4 | |
| 31456 | 4 | |
| 32173 | 3 | |
| 32397 | 5 | |
| 32745 | 5 | |
| 32763 | 4 | |
| 32799 | 5 | |
| 32807 | 5 | |
| 32901 | 5 |
| Value | Count | Frequency (%) |
| 4468707 | 5 | |
| 2012898 | 5 | |
| 1400541 | 5 | |
| 1148942 | 5 | |
| 826172 | 5 | |
| 776168 | 5 | |
| 748419 | 5 | |
| 701081 | 5 | |
| 661063 | 5 | |
| 508602 | 5 |
total_population
Real number (ℝ)
| Distinct | 594 |
|---|---|
| Distinct (%) | 20.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 198966.78 |
| Minimum | 63215 |
|---|---|
| Maximum | 8550405 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.7 KiB |
Quantile statistics
| Minimum | 63215 |
|---|---|
| 5-th percentile | 67271 |
| Q1 | 80429 |
| median | 106782 |
| Q3 | 175232 |
| 95-th percentile | 618619 |
| Maximum | 8550405 |
| Range | 8487190 |
| Interquartile range (IQR) | 94803 |
Descriptive statistics
| Standard deviation | 447555.93 |
|---|---|
| Coefficient of variation (CV) | 2.2494003 |
| Kurtosis | 219.21588 |
| Mean | 198966.78 |
| Median Absolute Deviation (MAD) | 32640 |
| Skewness | 13.044623 |
| Sum | 5.7521296 × 108 |
| Variance | 2.0030631 × 1011 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 68097 | 10 | 0.3% |
| 71024 | 10 | 0.3% |
| 82463 | 5 | 0.2% |
| 104808 | 5 | 0.2% |
| 87873 | 5 | 0.2% |
| 73432 | 5 | 0.2% |
| 248956 | 5 | 0.2% |
| 66872 | 5 | 0.2% |
| 107899 | 5 | 0.2% |
| 451949 | 5 | 0.2% |
| Other values (584) | 2831 |
| Value | Count | Frequency (%) |
| 63215 | 5 | |
| 63651 | 5 | |
| 63792 | 5 | |
| 64609 | 5 | |
| 64819 | 4 | |
| 64837 | 5 | |
| 64962 | 5 | |
| 65052 | 4 | |
| 65058 | 5 | |
| 65065 | 5 |
| Value | Count | Frequency (%) |
| 8550405 | 5 | |
| 3971896 | 5 | |
| 2720556 | 5 | |
| 2298628 | 5 | |
| 1567442 | 5 | |
| 1563001 | 5 | |
| 1469824 | 5 | |
| 1394907 | 5 | |
| 1300082 | 5 | |
| 1026919 | 5 |
number_of_veterans
Real number (ℝ)
| Distinct | 577 |
|---|---|
| Distinct (%) | 20.0% |
| Missing | 13 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9367.8325 |
| Minimum | 416 |
|---|---|
| Maximum | 156961 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.7 KiB |
Quantile statistics
| Minimum | 416 |
|---|---|
| 5-th percentile | 1990 |
| Q1 | 3739 |
| median | 5397 |
| Q3 | 9368 |
| 95-th percentile | 29511 |
| Maximum | 156961 |
| Range | 156545 |
| Interquartile range (IQR) | 5629 |
Descriptive statistics
| Standard deviation | 13211.22 |
|---|---|
| Coefficient of variation (CV) | 1.410275 |
| Kurtosis | 39.853818 |
| Mean | 9367.8325 |
| Median Absolute Deviation (MAD) | 2281 |
| Skewness | 5.2959233 |
| Sum | 26960622 |
| Variance | 1.7453633 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4211 | 10 | 0.3% |
| 5714 | 10 | 0.3% |
| 5204 | 10 | 0.3% |
| 3397 | 10 | 0.3% |
| 3116 | 10 | 0.3% |
| 3647 | 10 | 0.3% |
| 3063 | 10 | 0.3% |
| 5532 | 10 | 0.3% |
| 3404 | 9 | 0.3% |
| 3027 | 9 | 0.3% |
| Other values (567) | 2780 | |
| (Missing) | 13 | 0.4% |
| Value | Count | Frequency (%) |
| 416 | 3 | |
| 629 | 5 | |
| 693 | 4 | |
| 705 | 5 | |
| 724 | 5 | |
| 776 | 4 | |
| 780 | 5 | |
| 897 | 5 | |
| 1066 | 4 | |
| 1101 | 5 |
| Value | Count | Frequency (%) |
| 156961 | 5 | |
| 109089 | 5 | |
| 92489 | 5 | |
| 85417 | 5 | |
| 75432 | 5 | |
| 72388 | 5 | |
| 72042 | 5 | |
| 71898 | 5 | |
| 61995 | 5 | |
| 54995 | 5 |
foreign_born
Real number (ℝ)
| Distinct | 587 |
|---|---|
| Distinct (%) | 20.4% |
| Missing | 13 |
| Missing (%) | 0.4% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 40653.599 |
| Minimum | 861 |
|---|---|
| Maximum | 3212500 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.7 KiB |
Quantile statistics
| Minimum | 861 |
|---|---|
| 5-th percentile | 3215 |
| Q1 | 9224 |
| median | 18822 |
| Q3 | 33971.75 |
| 95-th percentile | 109222.15 |
| Maximum | 3212500 |
| Range | 3211639 |
| Interquartile range (IQR) | 24747.75 |
Descriptive statistics
| Standard deviation | 155749.1 |
|---|---|
| Coefficient of variation (CV) | 3.8311271 |
| Kurtosis | 310.38784 |
| Mean | 40653.599 |
| Median Absolute Deviation (MAD) | 11101 |
| Skewness | 16.355795 |
| Sum | 1.1700106 × 108 |
| Variance | 2.4257783 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5757 | 10 | 0.3% |
| 13409 | 10 | 0.3% |
| 30908 | 5 | 0.2% |
| 56514 | 5 | 0.2% |
| 21959 | 5 | 0.2% |
| 8948 | 5 | 0.2% |
| 10599 | 5 | 0.2% |
| 24503 | 5 | 0.2% |
| 9257 | 5 | 0.2% |
| 6630 | 5 | 0.2% |
| Other values (577) | 2818 | |
| (Missing) | 13 | 0.4% |
| Value | Count | Frequency (%) |
| 861 | 5 | |
| 1058 | 5 | |
| 1062 | 4 | |
| 1224 | 5 | |
| 1531 | 5 | |
| 1699 | 5 | |
| 1789 | 5 | |
| 1815 | 5 | |
| 1884 | 5 | |
| 2064 | 5 |
| Value | Count | Frequency (%) |
| 3212500 | 5 | |
| 1485425 | 5 | |
| 696210 | 5 | |
| 573463 | 5 | |
| 401493 | 5 | |
| 373842 | 5 | |
| 326825 | 5 | |
| 300702 | 5 | |
| 297199 | 5 | |
| 260789 | 5 |
average_household_size
Real number (ℝ)
| Distinct | 161 |
|---|---|
| Distinct (%) | 5.6% |
| Missing | 16 |
| Missing (%) | 0.6% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.7425426 |
| Minimum | 2 |
|---|---|
| Maximum | 4.98 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.7 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 2.22 |
| Q1 | 2.43 |
| median | 2.65 |
| Q3 | 2.95 |
| 95-th percentile | 3.58 |
| Maximum | 4.98 |
| Range | 2.98 |
| Interquartile range (IQR) | 0.52 |
Descriptive statistics
| Standard deviation | 0.43329109 |
|---|---|
| Coefficient of variation (CV) | 0.15798883 |
| Kurtosis | 2.8619082 |
| Mean | 2.7425426 |
| Median Absolute Deviation (MAD) | 0.24 |
| Skewness | 1.4095639 |
| Sum | 7884.81 |
| Variance | 0.18774117 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2.4 | 78 | 2.7% |
| 2.72 | 68 | 2.4% |
| 2.97 | 55 | 1.9% |
| 2.39 | 54 | 1.9% |
| 2.64 | 54 | 1.9% |
| 2.41 | 50 | 1.7% |
| 2.68 | 50 | 1.7% |
| 2.52 | 49 | 1.7% |
| 2.73 | 48 | 1.7% |
| 2.55 | 45 | 1.6% |
| Other values (151) | 2324 |
| Value | Count | Frequency (%) |
| 2 | 5 | 0.2% |
| 2.06 | 5 | 0.2% |
| 2.08 | 10 | |
| 2.1 | 4 | 0.1% |
| 2.11 | 5 | 0.2% |
| 2.12 | 5 | 0.2% |
| 2.13 | 15 | |
| 2.15 | 10 | |
| 2.16 | 9 | |
| 2.17 | 10 |
| Value | Count | Frequency (%) |
| 4.98 | 5 | |
| 4.78 | 4 | 0.1% |
| 4.58 | 5 | |
| 4.57 | 3 | 0.1% |
| 4.43 | 4 | 0.1% |
| 4.15 | 5 | |
| 4.13 | 5 | |
| 4.08 | 10 | |
| 3.97 | 5 | |
| 3.93 | 5 |
state_code
Categorical
| Distinct | 49 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.7 KiB |
| CA | |
|---|---|
| TX | |
| FL | |
| IL | 91 |
| WA | 85 |
| Other values (44) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 5782 |
|---|---|
| Distinct characters | 24 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | MD |
|---|---|
| 2nd row | MA |
| 3rd row | AL |
| 4th row | CA |
| 5th row | NJ |
Common Values
| Value | Count | Frequency (%) |
| CA | 676 | |
| TX | 273 | 9.4% |
| FL | 222 | 7.7% |
| IL | 91 | 3.1% |
| WA | 85 | 2.9% |
| AZ | 80 | 2.8% |
| CO | 80 | 2.8% |
| MI | 79 | 2.7% |
| NC | 70 | 2.4% |
| VA | 70 | 2.4% |
| Other values (39) | 1165 |
Length
| Value | Count | Frequency (%) |
| ca | 676 | |
| tx | 273 | 9.4% |
| fl | 222 | 7.7% |
| il | 91 | 3.1% |
| wa | 85 | 2.9% |
| az | 80 | 2.8% |
| co | 80 | 2.8% |
| mi | 79 | 2.7% |
| nc | 70 | 2.4% |
| va | 70 | 2.4% |
| Other values (39) | 1165 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 1210 | |
| C | 894 | |
| N | 425 | 7.4% |
| T | 414 | 7.2% |
| L | 387 | 6.7% |
| M | 341 | 5.9% |
| I | 339 | 5.9% |
| X | 273 | 4.7% |
| O | 244 | 4.2% |
| F | 222 | 3.8% |
| Other values (14) | 1033 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 5782 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 1210 | |
| C | 894 | |
| N | 425 | 7.4% |
| T | 414 | 7.2% |
| L | 387 | 6.7% |
| M | 341 | 5.9% |
| I | 339 | 5.9% |
| X | 273 | 4.7% |
| O | 244 | 4.2% |
| F | 222 | 3.8% |
| Other values (14) | 1033 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5782 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| A | 1210 | |
| C | 894 | |
| N | 425 | 7.4% |
| T | 414 | 7.2% |
| L | 387 | 6.7% |
| M | 341 | 5.9% |
| I | 339 | 5.9% |
| X | 273 | 4.7% |
| O | 244 | 4.2% |
| F | 222 | 3.8% |
| Other values (14) | 1033 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5782 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| A | 1210 | |
| C | 894 | |
| N | 425 | 7.4% |
| T | 414 | 7.2% |
| L | 387 | 6.7% |
| M | 341 | 5.9% |
| I | 339 | 5.9% |
| X | 273 | 4.7% |
| O | 244 | 4.2% |
| F | 222 | 3.8% |
| Other values (14) | 1033 |
race
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.7 KiB |
| Hispanic or Latino | |
|---|---|
| White | |
| Black or African-American | |
| Asian | |
| American Indian and Alaska Native |
Length
| Max length | 33 |
|---|---|
| Median length | 25 |
| Mean length | 16.940505 |
| Min length | 5 |
Characters and Unicode
| Total characters | 48975 |
|---|---|
| Distinct characters | 26 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Hispanic or Latino |
|---|---|
| 2nd row | White |
| 3rd row | Asian |
| 4th row | Black or African-American |
| 5th row | White |
Common Values
| Value | Count | Frequency (%) |
| Hispanic or Latino | 596 | |
| White | 589 | |
| Black or African-American | 584 | |
| Asian | 583 | |
| American Indian and Alaska Native | 539 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| or | 1180 | |
| hispanic | 596 | |
| latino | 596 | |
| white | 589 | |
| black | 584 | |
| african-american | 584 | |
| asian | 583 | |
| american | 539 | |
| indian | 539 | |
| and | 539 | |
| Other values (2) | 1078 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 6761 | |
| i | 5745 | |
| n | 5099 | |
| 4516 | 9.2% | |
| r | 2887 | 5.9% |
| c | 2887 | 5.9% |
| A | 2829 | 5.8% |
| e | 2251 | 4.6% |
| o | 1776 | 3.6% |
| t | 1724 | 3.5% |
| Other values (16) | 12500 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 37603 | |
| Uppercase Letter | 6272 | 12.8% |
| Space Separator | 4516 | 9.2% |
| Dash Punctuation | 584 | 1.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 6761 | |
| i | 5745 | |
| n | 5099 | |
| r | 2887 | |
| c | 2887 | |
| e | 2251 | 6.0% |
| o | 1776 | 4.7% |
| t | 1724 | 4.6% |
| s | 1718 | 4.6% |
| l | 1123 | 3.0% |
| Other values (7) | 5632 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 2829 | |
| L | 596 | 9.5% |
| H | 596 | 9.5% |
| W | 589 | 9.4% |
| B | 584 | 9.3% |
| I | 539 | 8.6% |
| N | 539 | 8.6% |
Space Separator
| Value | Count | Frequency (%) |
| 4516 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 584 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 43875 | |
| Common | 5100 | 10.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 6761 | |
| i | 5745 | |
| n | 5099 | |
| r | 2887 | 6.6% |
| c | 2887 | 6.6% |
| A | 2829 | 6.4% |
| e | 2251 | 5.1% |
| o | 1776 | 4.0% |
| t | 1724 | 3.9% |
| s | 1718 | 3.9% |
| Other values (14) | 10198 |
Common
| Value | Count | Frequency (%) |
| 4516 | ||
| - | 584 | 11.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 48975 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 6761 | |
| i | 5745 | |
| n | 5099 | |
| 4516 | 9.2% | |
| r | 2887 | 5.9% |
| c | 2887 | 5.9% |
| A | 2829 | 5.8% |
| e | 2251 | 4.6% |
| o | 1776 | 3.6% |
| t | 1724 | 3.5% |
| Other values (16) | 12500 |
count
Real number (ℝ)
| Distinct | 2785 |
|---|---|
| Distinct (%) | 96.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 48963.774 |
| Minimum | 98 |
|---|---|
| Maximum | 3835726 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 22.7 KiB |
Quantile statistics
| Minimum | 98 |
|---|---|
| 5-th percentile | 778.5 |
| Q1 | 3435 |
| median | 13780 |
| Q3 | 54447 |
| 95-th percentile | 162670.5 |
| Maximum | 3835726 |
| Range | 3835628 |
| Interquartile range (IQR) | 51012 |
Descriptive statistics
| Standard deviation | 144385.59 |
|---|---|
| Coefficient of variation (CV) | 2.9488247 |
| Kurtosis | 246.57282 |
| Mean | 48963.774 |
| Median Absolute Deviation (MAD) | 12231 |
| Skewness | 12.973526 |
| Sum | 1.4155427 × 108 |
| Variance | 2.0847198 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 535 | 3 | 0.1% |
| 713 | 3 | 0.1% |
| 1343 | 3 | 0.1% |
| 876 | 3 | 0.1% |
| 6547 | 3 | 0.1% |
| 1615 | 3 | 0.1% |
| 251 | 3 | 0.1% |
| 881 | 3 | 0.1% |
| 1713 | 2 | 0.1% |
| 906 | 2 | 0.1% |
| Other values (2775) | 2863 |
| Value | Count | Frequency (%) |
| 98 | 1 | |
| 128 | 1 | |
| 158 | 1 | |
| 182 | 1 | |
| 203 | 1 | |
| 204 | 1 | |
| 211 | 1 | |
| 216 | 1 | |
| 219 | 1 | |
| 227 | 1 |
| Value | Count | Frequency (%) |
| 3835726 | 1 | |
| 2485125 | 1 | |
| 2192248 | 1 | |
| 2177650 | 1 | |
| 1936732 | 1 | |
| 1386389 | 1 | |
| 1374535 | 1 | |
| 1304564 | 1 | |
| 1240092 | 1 | |
| 1161455 | 1 |
updated_at
Date
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 22.7 KiB |
| Minimum | 2022-12-10 21:24:34.727669 |
|---|---|
| Maximum | 2022-12-10 21:24:34.727669 |
Auto
The auto setting is an interpretable pairwise column metric of the following mapping:- Variable_type-Variable_type : Method, Range
- Categorical-Categorical : Cramer's V, [0,1]
- Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
- Numerical-Numerical : Spearman's ρ, [-1,1]
This configuration uses the recommended metric for each pair of columns.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.| city | state | median_age | male_population | female_population | total_population | number_of_veterans | foreign_born | average_household_size | state_code | race | count | updated_at | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Silver Spring | Maryland | 33.8 | 40601.0 | 41862.0 | 82463 | 1562.0 | 30908.0 | 2.60 | MD | Hispanic or Latino | 25924 | 2022-12-10 21:24:34.727669 |
| 1 | Quincy | Massachusetts | 41.0 | 44129.0 | 49500.0 | 93629 | 4147.0 | 32935.0 | 2.39 | MA | White | 58723 | 2022-12-10 21:24:34.727669 |
| 2 | Hoover | Alabama | 38.5 | 38040.0 | 46799.0 | 84839 | 4819.0 | 8229.0 | 2.58 | AL | Asian | 4759 | 2022-12-10 21:24:34.727669 |
| 3 | Rancho Cucamonga | California | 34.5 | 88127.0 | 87105.0 | 175232 | 5821.0 | 33878.0 | 3.18 | CA | Black or African-American | 24437 | 2022-12-10 21:24:34.727669 |
| 4 | Newark | New Jersey | 34.6 | 138040.0 | 143873.0 | 281913 | 5829.0 | 86253.0 | 2.73 | NJ | White | 76402 | 2022-12-10 21:24:34.727669 |
| 5 | Peoria | Illinois | 33.1 | 56229.0 | 62432.0 | 118661 | 6634.0 | 7517.0 | 2.40 | IL | American Indian and Alaska Native | 1343 | 2022-12-10 21:24:34.727669 |
| 6 | Avondale | Arizona | 29.1 | 38712.0 | 41971.0 | 80683 | 4815.0 | 8355.0 | 3.18 | AZ | Black or African-American | 11592 | 2022-12-10 21:24:34.727669 |
| 7 | West Covina | California | 39.8 | 51629.0 | 56860.0 | 108489 | 3800.0 | 37038.0 | 3.56 | CA | Asian | 32716 | 2022-12-10 21:24:34.727669 |
| 8 | O'Fallon | Missouri | 36.0 | 41762.0 | 43270.0 | 85032 | 5783.0 | 3269.0 | 2.77 | MO | Hispanic or Latino | 2583 | 2022-12-10 21:24:34.727669 |
| 9 | High Point | North Carolina | 35.5 | 51751.0 | 58077.0 | 109828 | 5204.0 | 16315.0 | 2.65 | NC | Asian | 11060 | 2022-12-10 21:24:34.727669 |
| city | state | median_age | male_population | female_population | total_population | number_of_veterans | foreign_born | average_household_size | state_code | race | count | updated_at | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2881 | Gulfport | Mississippi | 35.1 | 33108.0 | 38764.0 | 71872 | 6646.0 | 3072.0 | 2.54 | MS | White | 42870 | 2022-12-10 21:24:34.727669 |
| 2882 | Davis | California | 26.3 | 33493.0 | 34163.0 | 67656 | 2176.0 | 13997.0 | 2.69 | CA | American Indian and Alaska Native | 779 | 2022-12-10 21:24:34.727669 |
| 2883 | Los Angeles | California | 35.0 | 1958998.0 | 2012898.0 | 3971896 | 85417.0 | 1485425.0 | 2.86 | CA | Black or African-American | 404868 | 2022-12-10 21:24:34.727669 |
| 2884 | Mount Vernon | New York | 38.5 | 31876.0 | 36745.0 | 68621 | 2064.0 | 23777.0 | 2.85 | NY | Hispanic or Latino | 9446 | 2022-12-10 21:24:34.727669 |
| 2885 | Lynchburg | Virginia | 28.7 | 38614.0 | 41198.0 | 79812 | 4322.0 | 4364.0 | 2.48 | VA | White | 53727 | 2022-12-10 21:24:34.727669 |
| 2886 | Stockton | California | 32.5 | 150976.0 | 154674.0 | 305650 | 12822.0 | 79583.0 | 3.16 | CA | American Indian and Alaska Native | 19834 | 2022-12-10 21:24:34.727669 |
| 2887 | Southfield | Michigan | 41.6 | 31369.0 | 41808.0 | 73177 | 4035.0 | 4011.0 | 2.27 | MI | American Indian and Alaska Native | 983 | 2022-12-10 21:24:34.727669 |
| 2888 | Indianapolis | Indiana | 34.1 | 410615.0 | 437808.0 | 848423 | 42186.0 | 72456.0 | 2.53 | IN | White | 553665 | 2022-12-10 21:24:34.727669 |
| 2889 | Somerville | Massachusetts | 31.0 | 41028.0 | 39306.0 | 80334 | 2103.0 | 22292.0 | 2.43 | MA | American Indian and Alaska Native | 374 | 2022-12-10 21:24:34.727669 |
| 2890 | Coral Springs | Florida | 37.2 | 63316.0 | 66186.0 | 129502 | 4724.0 | 38552.0 | 3.17 | FL | White | 90896 | 2022-12-10 21:24:34.727669 |